The goal of this tutorial is to explore how to fit hidden Markov models
(HMMs) to movement data. To do so, we will investigate the R package
momentuHMM. This package builds on a slightly older package,
moveHMM, that was developed by Théo Michelot, Roland Langrock, and
Toby Patterson, see associated paper:
https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12578.
momentuHMM, was developed by Brett McClintock and Théo Michelot.
momentuHMM has new features such as allowing for more data streams,
inclusion of covariates as raster layers, and much more, see associated
paper:
https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12995.
The primary learning objectives in this first tutorial are to:
momentuHMM (\(\approx 30\) min)First, let’s load the packages that we will need to complete the analyses. Off course you need to have them installed first.
While the raster package is being phased out in favour of the new terra package, momentuHMM still relies on the raster package to extract covariates. Therefore, for this tutorial we will still use raster, but do start working with terra in your work where possible. Make sure to set working directory to “CANSSI_OTN_HMM_2023” of the HMM workshop folder:
One of the main features of the data used with HMMs is that locations are taken at regular time steps and that there are negligible position error. For example, HMMs are adequate to use on GPS tags that take locations at a set temporal frequency (e.g., every \(2\) hrs). Without reprocessing, HMMs are not particularly good for irregular times series of locations or locations with large measurement error (e.g., you would need to preprocess Argos data before applying HMMs).
Unfortunately, movement of aquatic species is rarely taken at regular time intervals and without measurement error. For example, the data set we will work on, is the movement track of three narwhals tagged with a Fastlock GPS tag for two weeks, provided by Dr. Marianne Marcoux.
First, let’s import the narwhal movement data. For simplicity, we only examine the GPS data from two weeks in August 2017 for three individuals.
The function ymd_hms() transforms dates stored as character or numeric vectors to POSIXct objects, that are useful and commonly used in ecology.
The data we obtain is often quite messy with some records missing information and other records duplicated. We can filter records to keep only complete location data using !is.na(x) & !is.na(y). To remove duplicate records (same time, location, and location class), we will use the lag function from dplyr, which will use the value from the previous row so that we can compare to the current row.
tracks_gps <- tracks_gps %>%
# remove missing locations
filter(!is.na(x) & !is.na(y),
# remove identical records
!(time == lag(time) & x == lag(x) & y == lag(y) & loc_class == lag(loc_class)))For the main tutorial, since we only use the fastloc GPS data, we don’t have to deal with location error.
Now we will plot the tracks with a new package ggOceanMaps that will show the tracks on the map of the Earth. To do so, we need to choose the limits of the map. We set them to the limits in Longitude and Latitude of our data (bboxbelow). This package is convenient since it deals with Longitude and Latitude, and there is no need to transform our dataset (other methods usually requires to transform the dataset into an sf object). In the crs argument, we define the coordinate reference system of the original data (in this case WGS84, which is defined by the EPSG code 4326).
#Transform the dataset into a data.frame with longitude, latitude and ID of the animal
df <- cbind.data.frame(Lon = c(tracks_gps$x),Lat = c(tracks_gps$y), ID = c(tracks_gps$ID))
#Define the geographical limits (box) of the background map: c(min_lon,max_lon,min_lat,max_lat)
bbox <-c(min(tracks_gps$x),max(tracks_gps$x),min(tracks_gps$y),max(tracks_gps$y))
#Plot the data, with background map
basemap(limits = c(bbox)) +
geom_spatial_path(data=df, aes(x = Lon, y = Lat,colour=factor(ID)), crs = 4326)+
ggtitle("Movement data of the narwhals")
We can also add the bathymetry to the map, to get a better sense of the movement by adding the argument bathymetry= TRUE in the basemap function. To do so, we will need to dezoom.
df <- cbind.data.frame(Lon = c(tracks_gps$x),Lat = c(tracks_gps$y), ID = c(tracks_gps$ID))
bbox <-c(min(tracks_gps$x)-5,max(tracks_gps$x)+5,min(tracks_gps$y)-1,max(tracks_gps$y)+1)
#Plot the data with bathymetry, replace the code by the correct argument
basemap(limits = c(bbox),bathymetry= TRUE) +
geom_spatial_path(data=df, aes(x = Lon, y = Lat,colour=factor(ID)), crs = 4326)+
ggtitle("Movement data of the narwhals")The classic HMM assumes the observations are collected in discrete time and that there is no missing data in the predictor variables. There are two key decisions we must make, (1) the temporal resolution to use, and (2) how to address data gaps. The desired resolution depends predominantly on the biological question you are asking as different behaviours and biological processes occur at different spatial and temporal scales (e.g., seasonal migration, daily movement between foraging and resting grounds, and fine scale foraging decisions). Generally, higher resolution data is preferred as it has more information. However, it is possible to have too-high of a resolution wherein information from fine-scale variability drowns out the signal from coarse-scale patterns of interest (e.g., seasonal migration). In this case, we will be linking the movement data with high resolution (75 s) dive data to look at finer-scale behaviours (on the order of a few hours). My rule of thumb, is that you want 3-50 data points per behaviour. For behaviours spanning several hours, that roughly corresponds to a desired resolution between 2 min and 60 min.
First, let’s calculate the time difference between successive records using difftime and lead (compares current row to following row) and place these values in a new column called dt. Note that the time difference is in minutes (units = "mins"). For the last record of each individual (i.e., when ID != lead(ID)), we will set the value to NA.
# Calculate time difference between locations
tracks_gps <- tracks_gps %>%
mutate(dt = ifelse(ID == lead(ID), # If next data row is same individual
# calculate time difference
difftime(lead(time), time, units = "mins"), NA))Let’s see what resolutions may be possible in the data by looking at the most frequent time gaps.
# Zoom in on short intervals
hist(tracks_gps$dt, 1000, main = NA, xlab = "Time difference (min)", xlim = c(0,100))# identify the most frequent dt
tracks_gps %>%
{table(.$dt)} %>%
sort(decreasing = TRUE) %>%
head()
##
## 10 11 12 22 9 13
## 400 145 90 64 63 63
We see that the most frequent time gap is \(10\) min, followed by \(11\), \(12\), \(22\), \(9\) and \(13\) min. We also see the majority of the gaps are \(< 60\) min, however some are in excess of \(600\) min. Because HMMs assume observations taken at regular time intervals, finer resolutions will contain more data gaps that would need to be interpolated. Frequent and large data gaps can be difficult to handle, especially as the number of missing data points approaches or exceeds the existing data. Let’s examine the potential data structure at different resolutions for the different animals.
We first create a function that approximates the proportion of missing locations we would have for a given resolution.# Make function to estimate proportion of missing locations
p_na <- function(time, res) {
time <- round_date(time, paste(res,"min")) # round to nearest resolution
time <- na.omit(time[time != lead(time)]) # remove duplicate time
# calculate maximum number of locations
max_n_loc <- length(seq(time[1], tail(time, 1) + res*60,
by = as.difftime(res, units = "mins")))
n_NA <- max_n_loc - (length(time)+1)
n_NA / max_n_loc
}We can now use this function to look at the proportion of NAs we would get with 10, 20, 30, and 60 min resolution.
# summarise track dt
tracks_gps %>%
group_by(ID) %>%
summarise(p_NA_10m = p_na(time, 10), # 10 min
p_NA_20m = p_na(time, 20), # 20 min
p_NA_30m = p_na(time, 30), # 30 min
p_NA_60m = p_na(time, 60)) %>% # 60 min
# return formatted table
kable(digits = 3, col.names = c("Narwhal ID", paste(c(10,20,30,60), "m"))) %>%
kable_styling() %>%
add_header_above(c("", "Resolution" = 4))| Narwhal ID | 10 m | 20 m | 30 m | 60 m |
|---|---|---|---|---|
| T172062 | 0.486 | 0.236 | 0.191 | 0.152 |
| T172064 | 0.502 | 0.203 | 0.120 | 0.074 |
| T172066 | 0.488 | 0.230 | 0.185 | 0.160 |
Here we see that the \(10\) min interval, around \(50\%\) of the locations would be missing. This is a lot, but if we choose finer resolutions, simulated data would outweigh real data, and may bias the results. For this tutorial, we will use a \(10\) min resolution.
There are several ways to deal with data gaps, and we will address two:
One strategy to address data gaps is to leaving the data streams (i.e., step length and turning angle) as NAs. For my own work, I have typically voided sections with \(>6\) missing locations. The maximum size of a gap to allow depends on the frequency of the missing data, frequency of locations, study species, and behaviours of interest.
Voiding gaps is particularly useful for moderate and large gaps, however, it may often be the best option for small gaps as well. The only catch is that although HMMs can allow for missing data in the state-dependent observation data (e.g., step length and turning angle), HMMs cannot have missing data if there are covariates on the transition probability (for example, if there was an effect of bathymetry on state probability).
In this case, we will void the data streams from locations predicting in gaps \(>60\) min. First, we will identify which steps need to be voided, then we will prepare the data and void the estimated step and angle data streams. We will do this again later in the tutorial, so we will wrap it into a function called prepData_NAGaps. The function will output a momentuHMMData object that can directly be fit using fitHMM.
This methods will not work on data with multiple individuals. To get around this, we can split the data into a list where each element of the list is the location data for each ID. We can use the R apply family functions to apply functions to each animal separately.
# convert tracks back to data.frame with xy coordinates
tracks_gps_ls <- tracks_gps %>%
split(., .$ID) # split into list# define function to replace step and turn angle of large gaps with NA
NA_gaps <- function(tracks, times){
# rows where location is within a large gap
rows <- which(
rowSums(apply(times, 1, function(X, tracks){
dplyr::between(tracks,
as.POSIXct(X[1], tz = "UTC"),
as.POSIXct(X[2], tz = "UTC"))
}, tracks$time))==1)
tracks$step[rows] <- NA
tracks$angle[rows] <- NA
return(tracks)
}
# define function to identify and void gaps
prepData_NAGaps <- function(track_list, tracks_crw, res, max_gap, ...){
# for each ID, identify which rows have gaps >= max_gap
gaps_ls_rows <- lapply(track_list, function(x){
which(difftime(lead(x$time), x$time, units = "mins") >= max_gap)
})
# create sequence of times for each track from gaps >= 60 min
gap_ls <- mapply(FUN = function(track, gaps){
# identify start and end date of each gap
gaps_ls_srt_end <- list(start = ceiling_date(track$time[gaps], paste(res, "min")),
end = floor_date(track$time[gaps+1], paste(res, "min")))
# combine into single vector for each track
data.frame(start = gaps_ls_srt_end$start, end = gaps_ls_srt_end$end)
},
track_list, gaps_ls_rows, SIMPLIFY = FALSE)
# prep data and list by ID
prep_tracks <- prepData(tracks_crw, ...) %>%
{split(., .$ID)}
# Interpolate the location at the times from the sequence
mapply(FUN = NA_gaps, prep_tracks, gap_ls,
SIMPLIFY = FALSE) %>%
do.call("rbind", .) # convert list of tracks back to a single data.frame
}
prep_tracks_gps_crw_NAgaps <- prepData_NAGaps(tracks_gps_ls, tracks_gps_crw,
10, 60, type = "UTM")
The main idea is to use tracks_gps_ls to locate the large gaps with the NA_gaps function and then
fill them with NAs in the tracks_gps_crw dataset. We are using the tracks_gps_crw dataset because smaller gaps are filled and the time-resolution is already set.
prep_tracks_gps_crw_NAgaps.
Another strategy to deal with larger gaps is to segment the tracks with a new individual ID. This may be particularly appropriate for gaps where we may reasonably expect that the underlying states are effectively independent of one another. Specifically, we may ask, over what period of time does the behaviour of the animal affect the subsequent behaviour. In this case, we may expect that the behaviour of a narwhal depends on the behaviour over the proceeding several hours, however is independent after 24 hours. We can split the tracks for gaps larger than a predetermined threshold by iterating the ID column. For each segmented path, it is then possible to void the large gaps and fit a correlated random walks on smaller gaps, as we did before.
In the next part of this tutorial, we will explore how to fit hidden Markov models
(HMMs) to the prep_tracks_gps_crw_NAgaps dataset. To do so, we will investigate the R package,momentuHMM. This package builds on a slightly older package, moveHMM, that was developed by Théo Michelot , Roland Langrock, and
Toby Patterson, see associated paper:
https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12578.momentuHMM, was developed by Brett McClintock and Théo Michelot. momentuHMM has new features such as allowing for more data streams,inclusion of covariates as raster layers, and much more, see associated paper: https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12995.
The primary learning objectives of this part are:
momentuHMM
Our data is ready to use, we can fit the HMM. For this we use
the function fitHMM. This is where we need to make many of our
modelling decisions and most of these decisions will be associated with
one of the arguments of the function. The minimum arguments fitHMM
requires are: fitHMM(data, nbState, dist, Par0).
First, when we fit a HMM, we need to decide the number of behavioural states we are going to model. To start simple, we will only use two behavioural states. These could be, for example, representing one behaviour with fast and directed movement (e.g., travelling) and one behaviour with a more slow and tortuous movement (e.g., residing). This mean that the argument nbStates will be set to \(2\).
Second, we need to decide whether we make the transition probabilities between the behavioural states dependent on covariates. Here, we will start simple and we will not use covariates.
Third, we need to select the distribution we will use to model the step lengths and the one we will use to model the turning angles. For now, we will use the gamma distribution for the step length and the von Mises for the turning angles. These are commonly used distributions, to model movement data. This means that the argument dist will be set to: list(step=“gamma”, angle=“vm”). Note that dist should be a list with an item for each data stream columns in the data that we are modelling (so here the column step and angle). The gamma distribution is strictly positive (i.e., it does not allow for \(0\)s). If you have step lengths that are exactly zero in your data set, you need to use zero-inflated distributions. But in this case, we have no zeros.
By default, fitHMM will set the argument estAngleMean to NULL, which means that we assume that the mean angle is \(0\) for both behaviours (i.e., the animal has a tendency to continue in the same direction) and that only the angle concentration parameters differ. The concentration parameters control the extent to which the animal continues forward versus turn. Doing so reduces the number of parameters to estimate. These are all very important decisions that you must make when you construct your model. In addition, we need to specify initial values for the parameters to estimate. The HMMs are fit using maximum likelihood estimate (MLE) with a numerical optimizer. An unfortunate aspect of fitting models using numerical optimizers, is that, to be able to explore the parameter space and find the best parameter values for the model, the optimizer needs to start somewhere. You need to decide where it starts. Unfortunately, choosing bad starting values can result in parameter estimates that are not the MLE, but just a local maximum. To choose your initial parameter you can take a quick peak at the data (e.g., using the plots below) and use some general biological information. For example, it’s common for animals to have one behaviour with long step lengths and small turning angles and one behaviour with short step lengths and larger turning angles. From the plots below (note that we can simply use plot on our data), it looks like the animal has step lengths that are close to \(100\) and \(600\). There could be a third behaviour with larger step lengths. But for now we will consider only two.
plot(prep_tracks_gps_crw_NAgaps$step ~
prep_tracks_gps_crw_NAgaps$time, ty = "l", ylab = "Step length",
xlab = "Date", las = 1)
abline(h = 100, col = rgb(1, 0, 0, 0.5))
abline(h = 600, col = rgb(1, 0, 0, 0.5))So let’s choose \(100\) and \(600\) for the means step lengths and use half those values for the standard deviations. The turning angles are either very close to \(0\) or pretty spread from \(-\pi/2\) to \(\pi/2\). High concentration parameter values (\(\kappa\), said kappa) mean that the animal has a tendency to move in the same direction. Values close to 0 mean that the turning angle distribution is almost uniform (the animal turns in all directions). Note that \(\kappa\) cannot be exactly 0, so let’s choose 0.1 and 1.
# define states (optional)
stateNames <- c("resident", "travel")
nbState <- length(stateNames)
# define distribution to use for each data stream
dist <- list(step = "gamma", angle = "vm")
# Setting up the starting values
mu0 <- c(100, 600) # Mean step length
sigma0 <- mu0/2 # SD of the step length
kappa0 <- c(0.1, 1) # Turn angle concentration parameter
# combine starting parameters
Par0 <- list(step = c(mu0, sigma0), angle = kappa0)Ok, were are ready. Let’s fit the HMM and look at the parameter estimates
# Fit a 2 state HMM
HMM_gps_crw_NAgaps <- fitHMM(prep_tracks_gps_crw_NAgaps,
stateNames = stateNames, nbState = 2,
dist = dist, Par0 = Par0)
# Let's look at parameter estimates
HMM_gps_crw_NAgaps## Value of the maximum log-likelihood: -17884.91
##
##
## step parameters:
## ----------------
## resident travel
## mean 282.0578 754.6505
## sd 153.8772 251.0847
##
## angle parameters:
## -----------------
## resident travel
## mean 0.000000 0.00000
## concentration 2.277154 18.01728
##
## Regression coeffs for the transition probabilities:
## ---------------------------------------------------
## 1 -> 2 2 -> 1
## (Intercept) -2.601642 -2.563678
##
## Transition probability matrix:
## ------------------------------
## resident travel
## resident 0.93096717 0.06903283
## travel 0.07151297 0.92848703
##
## Initial distribution:
## ---------------------
## resident travel
## 0.3025454 0.6974546
We can also plot them
## Decoding state sequence... DONE
Based on the mean step parameters, it looks like the first behavioural state (resident) has smaller step lengths compared to state 2. This is particularly easy to see in the step histogram (first figure), where the estimated distribution for each state is overlaid on top of the observed step length frequencies. The turning angle distributions of second state (travel) indicates much more direct movement. The concentration parameter is significantly bigger than that of state 1. Looking at the track we can see the estimated transitions between states.
Maybe looking at the state sequence in more detail will help us understand. Actually, we are interested in identifying when an animal is in each of the behavioural states (here when the animal is in state 1 vs state 2), something we call state decoding. To do so, we can use the function viterbi, which uses the Viterbi algorithm to produce the most likely sequence of states according to your fit model and data.
# Apply the Viterbi algorithm using your fited model object
dec_States <- viterbi(HMM_gps_crw_NAgaps)
# Let's look at predicted states of the first 20 time steps
head(dec_States, 20)## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
## dec_States
## 1 2
## 1476 1446
In many cases, it is more interesting to get the probability of being in each state rather than the most likely state. To do so, you can use the function stateProbs, which returns a matrix of the probability of being in each state for each time step.
# Calculate the probability of being in each state
statep <- stateProbs(HMM_gps_crw_NAgaps)
# Let's look at the state probability matrix
head(statep)## resident travel
## [1,] 0.9944545 5.545518e-03
## [2,] 0.9999782 2.183941e-05
## [3,] 0.9948147 5.185280e-03
## [4,] 0.6298961 3.701039e-01
## [5,] 0.7077569 2.922431e-01
## [6,] 0.9999950 4.965047e-06
We can see here the probability of being in both states for the first \(6\) time steps. Here, the probability of being in state 1 is really high for these steps, but that may not always be the case. Sometimes you might have values closer to \(0.5\), which would indicate that for that time step, it’s hard to identify which state you are in (i.e., which step length and turning angle distributions fit best).
You can plot the results from both of these functions using the functionplotState
## Decoding state sequence... DONE
## Computing state probabilities... DONE
The first thing we need to look into is whether the parameter estimates are reliable. As mentioned above, the initial parameter values can affect the estimating procedures. So it’s always good to check if you get similar results with different starting values. Here, we will make the starting values of the two behavioural states closer to one another.
# Setting up the starting values
mu2 <- c(400, 600) # Mean step length
sigma2 <- mu2/2 # SD of the step length
kappa2 <- c(1, 1) # Turn angle concentration parameter
# combine starting parameters
Par2 <- list(step = c(mu2, sigma2), angle = kappa2)
# Fit the same 2 state HMM
HMM_gps_crw_NAgaps2 <- fitHMM(prep_tracks_gps_crw_NAgaps,
stateNames = stateNames, nbState = 2,
dist = dist, Par0 = Par0)Let’s compare the two results. First let’s look at which of the two has the lowest negative log likelihood (equivalent of highest log likelihood, so closer to the real MLE). Let’s look also at the parameter estimates they each returned.
# Negative log likelihood
c(original = HMM_gps_crw_NAgaps$mod$minimum, new = HMM_gps_crw_NAgaps2$mod$minimum)## original new
## 17884.91 17884.91
## resident travel resident travel
## mean 282.0578 754.6505 282.0578 754.6505
## sd 153.8772 251.0847 153.8772 251.0847
## resident travel resident travel
## mean 0.000000 0.00000 0.000000 0.00000
## concentration 2.277154 18.01728 2.277154 18.01728
## resident travel resident travel
## resident 0.93096717 0.06903283 0.93096717 0.06903283
## travel 0.07151297 0.92848703 0.07151297 0.92848703
Looks like they both returned very close values for everything. So that’s good! The function fitHMM also has the argument retryFits which perturbs the parameter estimates and retries fitting the model. We use it when we think the estimates could be local maxima. Since the optimization of HMMs depend on the starting values chosen, it is always good to explore multiple initial values and then choose the best estimates among them (which is done by retryFits). The argument is used to indicate the number of times you want perturb the parameters and retry fitting the model (you can choose the size of the perturbation by setting the argument retrySD).
# Fit the same 2-state HMM with retryFits
# This is a random pertubation, so setting the seed to get the same results
set.seed(29)
HMM_RF <- fitHMM(prep_tracks_gps_crw_NAgaps,
nbState = 2,
dist=list(step="gamma", angle="vm"),
Par0 = list(step=c(mu2, sigma2), angle=kappa2),
retryFits=10)## Attempting to improve fit using random perturbation. Press 'esc' to force exit from 'fitHMM'
##
Attempt 1 of 10 -- current log-likelihood value: -17884.91
Attempt 2 of 10 -- current log-likelihood value: -17884.91
Attempt 3 of 10 -- current log-likelihood value: -17884.91
Attempt 4 of 10 -- current log-likelihood value: -17884.91
Attempt 5 of 10 -- current log-likelihood value: -17884.91
Attempt 6 of 10 -- current log-likelihood value: -17884.91
Attempt 7 of 10 -- current log-likelihood value: -17884.91
Attempt 8 of 10 -- current log-likelihood value: -17884.91
Attempt 9 of 10 -- current log-likelihood value: -17884.91
Attempt 10 of 10 -- current log-likelihood value: -17884.91
Let’s compare the results again.
# Negative log likelihood
c(original = HMM_gps_crw_NAgaps$mod$minimum, new = HMM_gps_crw_NAgaps2$mod$minimum,
retryFits = HMM_RF$mod$minimum)## original new retryFits
## 17884.91 17884.91 17884.91
# Parameter estimates
cbind(HMM_gps_crw_NAgaps$mle$step, HMM_gps_crw_NAgaps2$mle$step, HMM_RF$mle$step)## resident travel resident travel state 1 state 2
## mean 282.0578 754.6505 282.0578 754.6505 282.0581 754.6507
## sd 153.8772 251.0847 153.8772 251.0847 153.8775 251.0848
## resident travel resident travel state 1 state 2
## mean 0.000000 0.00000 0.000000 0.00000 0.00000 0.00000
## concentration 2.277154 18.01728 2.277154 18.01728 2.27715 18.01732
## resident travel resident travel state 1 state 2
## resident 0.93096717 0.06903283 0.93096717 0.06903283 0.93096679 0.06903321
## travel 0.07151297 0.92848703 0.07151297 0.92848703 0.07151399 0.92848601
Still looking very similar!
momentuHMM has functions to help selecting initial parameters for complex models. However, these functions rely on the user choosing initial parameter values for a simple model like the one we just explored.
We fitted a model to the data, looks like the parameter estimates are reliable, and we can get state probabilities, but is this a good model for our data? Can we use the model results? The best way to investigate model fit is through pseudo-residuals. Pseudo-residuals are a type of model residuals that account for the interdependence of observations. They are calculated for each of the time series (e.g., you will have pseudo-residuals for the step length time series and for the turning angle time series). If the fit model is appropriate, the pseudo-residuals produced by the functions pseudoRes should follow a standard normal distribution. You can look at pseudo-residuals directly via the function plotPR, which plots the pseudo-residual times-series, the qq-plots, and the autocorrelation functions (ACF).
## Computing pseudo-residuals...
Both the qq-plot and the ACF plot indicate that the model does not fit the step length time series particularly well. The ACF indicates that there is severe autocorrelation remaining in the time series, even after accounting for the persistence in the underlying behavioural states.
This could indicate that there are more hidden behavioural states. Let’s try a 3-state HMMs.# Setting up the starting values
mu3 <- c(100, 600, 1000) # Mean step length
sigma3 <- mu3/2 # SD of the step length
kappa3 <- c(0.1, 1, 1) # Turn angle concentration parameter
# combine starting parameters
Par3 <- list(step = c(mu3, sigma3), angle = kappa3)
# Fit the same 3 state HMM
HMM_gps_crw_NAgaps3 <- fitHMM(prep_tracks_gps_crw_NAgaps,
nbState = 3,
dist = dist, Par0 = Par3)
plot(HMM_gps_crw_NAgaps3)## Decoding state sequence... DONE
If we look at the pseudo-residuals
## Computing pseudo-residuals...
It’s better. Not perfect, but less unexplained autocorrelation, especially in the step lengths. If we look at the step length distribution, we can see that we have still our state with really small steps (maybe resting?), a state with steps of medium size (maybe foraging?), and a state with large steps (maybe travelling?). Looking at the track, it does look like the movement in state 2 is more tortuous and in focal areas, while the movement in state \(3\) is very directed and span larger areas.